feat(q4_0): FP32→Q4_0 quantizer (loader-agnostic production) by michalharakal · Pull Request #651 · SKaiNET-developers/SKaiNET

michalharakal · 2026-05-30T17:58:31Z

Phase B — the produce side of Q4_0. Q4_0 was decode-only (GGUF arrives pre-quantized); this adds Q4_0Quantizer in commonMain so any source of dense FP32 weights — a SafeTensors/JSON loader, an in-memory tensor, an offline tool — can emit canonical ggml Q4_0 blocks without going through GGUF. This is the loader-agnostic primitive that "Q4_0 from any loader" actually requires.

What

Q4_0Quantizer.quantizeToBytes(FloatArray) / .quantize(FloatArray, Shape): Q4_0BlockTensorData.
Matches ggml quantize_row_q4_0: per 32-block, d = max/-8, code = clamp(round(x/d + 8), 0, 15), canonical split packing, FP16 round-to-nearest scale.

Tests

Q4_0QuantizerTest — round-trip within 4-bit error, max-element recovery, zero-block, validation.
Q4_0QuantizeRoundTripMatmulTest — quantized weights run through ctx.ops.matmul and track the dense FP32 result, proving the quantizer output is consumable by the scalar/Panama/native kernels (Phase B ↔ Phase A).

Deliberately deferred

Automatic on-load quantization via a loader policy is not wired here. DTypePolicy targets logical DType, not TensorEncoding — so requesting "Q4_0" needs a new encoding-policy type, an RFC-level API decision (parallel to #615) the maintainer should own. This PR ships the reusable primitive every such path would call; the policy hook is a clean follow-up.

Targeting 0.27.0. Stack #647→#650 already merged to develop; this branches off develop. Next: PR5 (docs).

🤖 Generated with Claude Code

Adds Q4_0Quantizer in commonMain — the produce side Q4_0 was missing (it was decode-only, since GGUF arrives pre-quantized). Now any source of dense FP32 weights — a SafeTensors/JSON loader, an in-memory tensor, an offline tool — can emit canonical ggml Q4_0 blocks without GGUF. Algorithm matches ggml quantize_row_q4_0: per 32-element block, scale d = max/-8 (max = signed max-magnitude element), code = clamp(round( x/d + 8), 0, 15), packed in the canonical split layout; scale stored as round-to-nearest FP16. Tests: - Q4_0QuantizerTest — round-trips through Q4_0TensorData.toFloatArray within 4-bit error, recovers the max element, zero stays zero. - Q4_0QuantizeRoundTripMatmulTest — quantized weights run through the matmul dispatch and track the dense FP32 result, proving the quantizer output is consumable by the (scalar/Panama/native) kernels. Note: automatic on-load quantization via a loader policy is deliberately NOT wired here. DTypePolicy targets logical DType, not TensorEncoding, so requesting "Q4_0" needs a new encoding-policy type — an RFC-level API decision (parallel to #615) the maintainer should own. This PR ships the reusable primitive every such path would call. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-30T18:00:42Z

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

Operator documentation: docs/modules/operators/_generated_/
JSON schema output: operators.json

Artifacts:

Download the documentation-preview-651 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

michalharakal mentioned this pull request May 30, 2026

docs(q4_0): changelog + quantized-kernels page for first-class Q4_0 #652

Merged

michalharakal merged commit 8fb4168 into develop May 30, 2026
10 checks passed

michalharakal deleted the feature/q4_0-quantizer branch May 30, 2026 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(q4_0): FP32→Q4_0 quantizer (loader-agnostic production)#651

feat(q4_0): FP32→Q4_0 quantizer (loader-agnostic production)#651
michalharakal merged 1 commit into
developfrom
feature/q4_0-quantizer

michalharakal commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented May 30, 2026

What

Tests

Deliberately deferred

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant